knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
library(tidyverse)
library(here)
library(broom)
# Time series packages
library(tsibble)
library(feasts)
library(fable)
To reinforce skills for wrangling, visualizing, and forecasting with time series data, we will use data on US residential energy consumption from January 1973 - October 2017 (from the US Energy Information Administration).
Read in data and convert to a time series
energy <- read_csv(here("data", "energy.csv"))
Explore the energy object as it currently exists. Notice that there is a column month that contains the month name, and 4-digit year. Currently, however, R understands that as a character (instead of as a date). Our next step is to convert it into a time series data frame (a tsibble), in two steps:
Here’s what that looks like in a piped sequence:
#turning into an energy time series
energy_ts <- energy %>%
mutate(date = tsibble::yearmonth(month)) %>%
as_tsibble(key = NULL, index = date) #this is turning it into a time series dataframe, the index is the column it is using as its time index
Now that it’s stored as a tsibble, we can start visualizing, exploring and working with it a bit easier.
Let’s take a quick look at our tsibble (for residential energy use, in trillion BTU):
ggplot(data = energy_ts, aes(x = date, y = res_total)) +
geom_line() +
labs(y = "Residential energy consumption \n (Trillion BTU)")
Looks like there are some interesting things happening. We should ask:
The big ones to notice quickly here are:
A seasonplot can help point out seasonal patterns, and help to glean insights over the years. We’ll use feasts::gg_season() to create an exploratory seasonplot, which has month on the x-axis, energy consumption on the y-axis, and each year is its own series (mapped by line color).
energy_ts %>%
gg_season(y = res_total) + #gg_season comes from the feasts package
theme_minimal() +
scale_color_viridis_c() +
labs(x = "month",
y = "residential energy consumption (trillion BTU)")
This is really useful for us to explore both seasonal patterns, and how those seasonal patterns have changed over the years of this data (1973 - 2017). What are the major takeaways from this seasonplot?
Let’s explore the data a couple more ways:
energy_ts %>% gg_subseries(res_total) # this is looking at a separate plot for each month
Our takeaway here is similar: there is clear seasonality (higher values in winter months), with an increasingly evident second peak in June/July/August. This reinforces our takeaways from the raw data and seasonplots.
See Rob Hyndman’s section on STL decomposition to learn how it compares to classical decomposition we did last week: “STL is a versatile and robust method for decomposing time series. STL is an acronym for “Seasonal and Trend decomposition using Loess”, while Loess is a method for estimating nonlinear relationships."
Notice that it allows seasonality to vary over time (a major difference from classical decomposition, and important here since we do see changes in seasonality).
# Find STL decomposition
dcmp <- energy_ts %>%
model(STL(res_total ~ season())) #this is mapping the residential total as a function of season
# View the components
# components(dcmp) --> in this way you can calculate the trend and the season year
# Visualize the decomposed components
components(dcmp) %>%
autoplot() +
theme_minimal()
NOTE: those grey bars on the side show relative scale of the total, trend, and seasonality relative to the remainder. A more clear example - note the residuals span a range of about -.5 to +.5, while the other components span larger variation:
We use the ACF to explore autocorrelation (here, we would expect seasonality to be clear from the ACF):
energy_ts %>%
ACF(res_total) %>%
autoplot()
#this will calculate the autocorrelation function based on the residential total
And yep, we see that observations separated by 12 months are the most highly correlated, reflecting strong seasonality we see in all of our other exploratory visualizations.
Note: here we use ETS, which technically uses different optimization than Holt-Winters exponential smoothing, but is otherwise the same (From Rob Hyndman: “The model is equivalent to the one you are fitting with HoltWinters(), although the parameter estimation in ETS() uses MLE.”)
To create the model below, we specify the model type (exponential smoothing, ETS), then tell it what type of seasonality it should assume using the season("") expression, where “N” = non-seasonal (try changing it to this to see how unimpressive the forecast becomes!), “A” = additive, “M” = multiplicative. Here, we’ll say seasonality is multiplicative due to the change in variance over time and also within the secondary summer peak:
# Create the model:
energy_fit <- energy_ts %>%
model(
ets = ETS(res_total ~ season("M")) #this is built into the feasts package
) # the model we are using is the ETS model, to optimize parameters, we are modeling residential total as a function of season, the "M" is the mulcaplicative input, this will look at all our data and optimize for best fit of seasonality across the time series
# Forecast using the model 10 years into the future:
energy_forecast <- energy_fit %>%
forecast(h = "10 years")
# Plot just the forecasted values (with 80 & 95% CIs):
energy_forecast %>%
autoplot()
# Or plot it added to the original data:
energy_forecast %>%
autoplot(energy_ts)
We can use broom::augment() to append our original tsibble with what the model predicts the energy usage would be based on the model. Let’s do a little exploring through visualization.
First, use broom::augment() to get the predicted values & residuals:
# Append the predicted values (and residuals) to original energy data
energy_predicted <- broom::augment(energy_fit)
# Use View(energy_predicted) to see the resulting data frame
#fitted is what the model predicted, the residual value is in the other columns
Now, plot the actual energy values (res_total), and the predicted values (stored as .fitted) atop them:
ggplot(data = energy_predicted) +
geom_line(aes(x = date, y = res_total)) +
geom_line(aes(x = date, y = .fitted), color = "red", alpha = .7)
Cool, those look like pretty good predictions!
Now let’s explore the residuals. Remember, some important considerations: Residuals should be uncorrelated, centered at 0, and ideally normally distributed. One way we can check the distribution is with a histogram:
ggplot(data = energy_predicted, aes(x = .resid)) +
geom_histogram()
We see that this looks relatively normally distributed, and centered at 0 (we could find summary statistics beyond this to further explore).
This is the END of what you are expected to complete for Part 1 on time series exploration and forecasting. Section E, below, shows how to use other forecasting models (seasonal naive and autoregressive integrated moving average, the latter which was not covered in ESM 244 this year).
There are a number of other forecasting methods and models! You can learn more about ETS forecasting, seasonal naive (SNAIVE) and autoregressive integrated moving average (ARIMA) from Hyndman’s book - those are the models that I show below.
# Fit 3 different forecasting models (ETS, ARIMA, SNAIVE):
energy_fit_multi <- energy_ts %>%
model(
ets = ETS(res_total ~ season("M")),
arima = ARIMA(res_total),
snaive = SNAIVE(res_total)
)
# Forecast 3 years into the future (from data end date)
multi_forecast <- energy_fit_multi %>%
forecast(h = "3 years")
# Plot the 3 forecasts
multi_forecast %>%
autoplot(energy_ts)
# Or just view the forecasts (note the similarity across models):
multi_forecast %>%
autoplot()
We can see that all three of these models (exponential smoothing, seasonal naive, and ARIMA) yield similar forecasting results.